Evaluating the Replicability of Significance Tests for Comparing Learning Algorithms

نویسندگان

Remco R. Bouckaert

Eibe Frank

چکیده

Empirical research in learning algorithms for classification tasks generally requires the use of significance tests. The quality of a test is typically judged on Type I error (how often the test indicates a difference when it should not) and Type II error (how often it indicates no difference when it should). In this paper we argue that the replicability of a test is also of importance. We say that a test has low replicability if its outcome strongly depends on the particular random partitioning of the data that is used to perform it. We present empirical measures of replicability and use them to compare the performance of several popular tests in a realistic setting involving standard learning algorithms and benchmark datasets. Based on our results we give recommendations on which test to use.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluating machine learning methods and satellite images to estimate combined climatic indices

The reflections recorded on satellite images have been affected by various environmental factors. In these images, some of these factors are combined with other environmental factors that cannot be distinguished. Therefore, it seems wise to model these environmental phenomena in the form of hybrid indicators. In this regard, satellite imagery and machine learning methods can play a unique role ...

متن کامل

Choosing Learning Algorithms Using Sign Tests with High Replicability

An important task in machine learning is determining which learning algorithm works best for a given data set. When the amount of data is small the same data needs to be used repeatedly in order to get a reasonable estimate of the accuracy of the learning algorithms. This results in violations of assumptions on which standard tests are based and makes it hard to design a good test. In this arti...

متن کامل

Choosing Between Two Learning Algorithms Based on Calibrated Tests

Designing a hypothesis test to determine the best of two machine learning algorithms with only a small data set available is not a simple task. Many popular tests suffer from low power (5x2 cv [2]), or high Type I error (Weka’s 10x10 cross validation [11]). Furthermore, many tests show a low level of replicability, so that tests performed by different scientists with the same pair of algorithms...

متن کامل

Using Machine Learning Algorithms for Automatic Cyber Bullying Detection in Arabic Social Media

Social media allows people interact to express their thoughts or feelings about different subjects. However, some of users may write offensive twits to other via social media which known as cyber bullying. Successful prevention depends on automatically detecting malicious messages. Automatic detection of bullying in the text of social media by analyzing the text "twits" via one of the machine l...

متن کامل

Optimizing the AGC system of a three-unequal-area hydrothermal system based on evolutionary algorithms

This paper focuses on expanding and evaluating an automatic generation control (AGC) system of a hydrothermal system by modelling the appropriate generation rate constraints to operate practically in an economic manner. The hydro area is considered with an electric governor and the thermal area is modelled with a reheat turbine. Furthermore, the integral controllers and electri...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2004

Evaluating the Replicability of Significance Tests for Comparing Learning Algorithms

نویسندگان

چکیده

منابع مشابه

Evaluating machine learning methods and satellite images to estimate combined climatic indices

Choosing Learning Algorithms Using Sign Tests with High Replicability

Choosing Between Two Learning Algorithms Based on Calibrated Tests

Using Machine Learning Algorithms for Automatic Cyber Bullying Detection in Arabic Social Media

Optimizing the AGC system of a three-unequal-area hydrothermal system based on evolutionary algorithms

عنوان ژورنال:

اشتراک گذاری